Goto

Collaborating Authors

 offensive language





Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses

Lin, Luyang, Feng, Zijin, Wang, Lingzhi, Wong, Kam-Fai

arXiv.org Artificial Intelligence

Biased news contributes to societal polarization and is often reinforced by hostile reader comments, constituting a vital yet often overlooked aspect of news dissemination. Our study reveals that offensive comments support biased content, amplifying bias and causing harm to targeted groups or individuals. Counterspeech is an effective approach to counter such harmful speech without violating freedom of speech, helping to limit the spread of bias. To the best of our knowledge, this is the first study to explore counterspeech generation in the context of news articles. We introduce a manually annotated dataset linking media bias, offensive comments, and counterspeech. We conduct a detailed analysis showing that over 70\% offensive comments support biased articles, amplifying bias and thus highlighting the importance of counterspeech generation. Comparing counterspeech generated by humans and large language models, we find model-generated responses are more polite but lack the novelty and diversity. Finally, we improve generated counterspeech through few-shot learning and integration of news background information, enhancing both diversity and relevance.


Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models

AlDahoul, Nouar, Zaki, Yasir

arXiv.org Artificial Intelligence

The rise of social media and online communication platforms has led to the spread of Arabic textual posts and memes as a key form of digital expression. While these contents can be humorous and informative, they are also increasingly being used to spread offensive language and hate speech. Consequently, there is a growing demand for precise analysis of content in Arabic text and memes. This paper explores the potential of large language models to effectively identify hope, hate speech, offensive language, and emotional expressions within such content. We evaluate the performance of base LLMs, fine-tuned LLMs, and pre-trained embedding models. The evaluation is conducted using a dataset of Arabic textual speech and memes proposed in the ArabicNLP MAHED 2025 challenge. The results underscore the capacity of LLMs such as GPT-4o-mini, fine-tuned with Arabic textual speech, and Gemini Flash 2.5, fine-tuned with Arabic memes, to deliver the superior performance. They achieve up to 72.1%, 57.8%, and 79.6% macro F1 scores for tasks 1, 2, and 3, respectively, and secure first place overall in the Mahed 2025 challenge. The proposed solutions offer a more nuanced understanding of both text and memes for accurate and efficient Arabic content moderation systems.


Overcoming Low-Resource Barriers in Tulu: Neural Models and Corpus Creation for OffensiveLanguage Identification

D, Anusha M, Vikram, Deepthi, Chakravarthi, Bharathi Raja, Hegde, Parameshwar R

arXiv.org Artificial Intelligence

Tulu, a low-resource Dravidian language predominantly spoken in southern India, has limited computational resources despite its growing digital presence. This study presents the first benchmark dataset for Offensive Language Identification (OLI) in code-mixed Tulu social media content, collected from YouTube comments across various domains. The dataset, annotated with high inter-annotator agreement (Krippendorff's alpha = 0.984), includes 3,845 comments categorized into four classes: Not Offensive, Not Tulu, Offensive Untargeted, and Offensive Targeted. We evaluate a suite of deep learning models, including GRU, LSTM, BiGRU, BiLSTM, CNN, and attention-based variants, alongside transformer architectures (mBERT, XLM-RoBERTa). The BiGRU model with self-attention achieves the best performance with 82% accuracy and a 0.81 macro F1-score. Transformer models underperform, highlighting the limitations of multilingual pretraining in code-mixed, under-resourced contexts. This work lays the foundation for further NLP research in Tulu and similar low-resource, code-mixed languages.


Offensive Language Detection on Social Media Using XLNet

Alothman, Reem, Benhidour, Hafida, Kerrache, Said

arXiv.org Artificial Intelligence

The widespread use of text-based communication on social media-through chats, comments, and microblogs-has improved user interaction but has also led to an increase in offensive content, including hate speech, racism, and other forms of abuse. Due to the enormous volume of user-generated content, manual moderation is impractical, which creates a need for automated systems that can detect offensive language. Deep learning models, particularly those using transfer learning, have demonstrated significant success in understanding natural language through large-scale pretraining. In this study, we propose an automatic offensive language detection model based on XLNet, a generalized autoregressive pretraining method, and compare its performance with BERT (Bidirectional Encoder Representations from Transformers), which is a widely used baseline in natural language processing (NLP). Both models are evaluated using the Offensive Language Identification Dataset (OLID), a benchmark Twitter dataset that includes hierarchical annotations. Our experimental results show that XLNet outperforms BERT in detecting offensive content and in categorizing the types of offenses, while BERT performs slightly better in identifying the targets of the offenses. Additionally, we find that oversampling and undersampling strategies are effective in addressing class imbalance and improving classification performance. These findings highlight the potential of transfer learning and XLNet-based architectures to create robust systems for detecting offensive language on social media platforms.


K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean

Jeon, Minkyeong, Jeong, Hyemin, Kim, Yerang, Kim, Jiyoung, Cho, Jae Hyeon, Lee, Byung-Jun

arXiv.org Artificial Intelligence

Language detoxification involves removing toxicity from offensive language. While a neutral-toxic paired dataset provides a straightforward approach for training detoxification models, creating such datasets presents several challenges: i) the need for human annotation to build paired data, and ii) the rapid evolution of offensive terms, rendering static datasets quickly outdated. To tackle these challenges, we introduce an automated paired data generation pipeline, called K/DA. This pipeline is designed to generate offensive language with implicit offensiveness and trend-aligned slang, making the resulting dataset suitable for detoxification model training. We demonstrate that the dataset generated by K/DA exhibits high pair consistency and greater implicit offensiveness compared to existing Korean datasets, and also demonstrates applicability to other languages. Furthermore, it enables effective training of a high-performing detoxification model with simple instruction fine-tuning.


Context-Aware Content Moderation for German Newspaper Comments

Krejca, Felix, Kietreiber, Tobias, Buchelt, Alexander, Neumaier, Sebastian

arXiv.org Artificial Intelligence

The increasing volume of online discussions requires advanced automatic content moderation to maintain responsible discourse. While hate speech detection on social media is well-studied, research on German-language newspaper forums remains limited. Existing studies often neglect platform-specific context, such as user history and article themes. This paper addresses this gap by developing and evaluating binary classification models for automatic content moderation in German newspaper forums, incorporating contextual information. Using LSTM, CNN, and ChatGPT-3.5 Turbo, and leveraging the One Million Posts Corpus from the Austrian newspaper Der Standard, we assess the impact of context-aware models. Results show that CNN and LSTM models benefit from contextual information and perform competitively with state-of-the-art approaches. In contrast, ChatGPT's zero-shot classification does not improve with added context and underperforms.


Stronger Together: Unleashing the Social Impact of Hate Speech Research

Wong, Sidney

arXiv.org Artificial Intelligence

The advent of the internet has been both a blessing and a curse for once marginalised communities. When used well, the internet can be used to connect and establish communities crossing different intersections; however, it can also be used as a tool to alienate people and communities as well as perpetuate hate, misinformation, and disinformation especially on social media platforms. We propose steering hate speech research and researchers away from pre-existing computational solutions and consider social methods to inform social solutions to address this social problem. In a similar way linguistics research can inform language planning policy, linguists should apply what we know about language and society to mitigate some of the emergent risks and dangers of anti-social behaviour in digital spaces. We argue linguists and NLP researchers can play a principle role in unleashing the social impact potential of linguistics research working alongside communities, advocates, activists, and policymakers to enable equitable digital inclusion and to close the digital divide.